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‘All models are wrong but some are useful’ 


George Box 


Introduction 

‘These are heady times for biologists, with new methods 
offering unprecedented experimental insight into low- 
level biological mechanisms. Gene-editing and condi- 
tional gene expression allows precise examination of 
the role of specific genes at particular times and in 
particular tissues [1]. Optogenetic methods provide exqui- 
site control over the activation or inhibition of selected 
cells or cell types [2]. Non-invasive brain imaging allows 
us to peer into the brains of both humans and non-human 
animals [3,4°,5°°,6°]. Given these massive methodological 
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advances, it is fair to ask what the payoff has been in our 
understanding of cognition, particularly complex cogni- 
tion (e.g., language, tool use or social cognition). The 
honest answer must be that the payoffs have been modest 
so far. Even for tiny organisms like the roundworm Cae- 
norhabditis elegans, where all 302 neurons are identified and 
their entire connectome of ~7000 synapses has been 
known precisely for years [7], neurobiologists are still 
grappling to understand the computations underlying 
the worm’s ‘decision’ to move forward or backward [8]. 
It has even been suggested that this vast factual database 
has not helped us understand behaviour [9]. 


When we turn to animal cognition in relation to human 
language, the situation appears dire indeed. A constant 
flow of high-profile papers proclaiming animal 
‘precursors’ or abilities ‘relevant to’ advanced cognition 
turn out, upon closer inspection, to be either methodo- 
logically flawed or of questionable relevance. This has led 
some prominent commentators to suggest that this whole 
line of research may be a dead end, and that studies of 
animal communication have nearly nothing to teach us 
about the important aspects of human language [10], or 
even more critically, that modern neurobiology has no 
clue about fundamental issues concerning the neurobio- 
logical basis of memory [11]. 


In this essay I argue that such pessimism is unwarranted. 
Despite the need for more cautious interpretation, studies 
of animal cognition can play a crucial role in understand- 
ing advanced human cognition, and indeed they must 
play a central role in understanding its neural implemen- 
tation. The crucial issue impeding progress at the 
moment is a dearth of explicit computational models that 
are tightly linked to implementational hypotheses via 
specific algorithmic proposals [12°]. In those few happy 
areas where such linking hypotheses have been proposed 
(cf. Table 1 for a sampling), the ultimate payoff has been 
substantially increased understanding. But without 
explicit, computational formulations of this sort, interspe- 
cies comparisons become subjective, imprecise, and 
unproductive. 
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Table 1 
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Computational bridging relations: examples of hypotheses that attempt to bridge between cognitive and neural descriptions via explicit 


computational models. 


Task description Algorithmic level description 


Hebbian learning Increase connection strength for 
neurons that fire together 
Binaural sound localization | Compute inter-aural time difference 
Sound stream 
segmentation 


Hierarchical processing 


Multi-scale sampling 


Context-free grammar 





To forestall misunderstanding, it is important to point out 
that I am not arguing that a single trait, or neural circuit, is 
the sole difference granting language to our species. 
Instead, 


Coincidence detection leads to 
increased coupling 

Delay lines from each ear; 
coincidence detection 
Multi-frequency cortical oscillations 


Pushdown automaton (stack) 


Engineering description Biological implementation 


LTP and NMDA receptors [13] 


Coincidence cells and delay lines in 
Nucleus Laminaris [14] 

Delta, theta and gamma coupling 
[15,16°°] 

Broca’s area stack for temporal 
lexicon via arcuate fasiculus [17] 
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‘Bill, who’d had more than a bit too much to drink 
with John, kissed the Dean’ 


‘Bill’ and ‘kissed’ can be seen as hierarchically adjacent, 
despite the many words between them. Because of this, 
we interpret ‘Bill’ and not ‘John’ as doing the kissing, 
despite the direct sequential adjacency of ‘John’ and 
‘kissed’. While modern linguists disagree about many 
things, there is virtually universal agreement on this 
point, despite a persistent tendency for those in other 
fields to conflate syntax with word sequence. 


A related formal insight concerns the computational 
requirements for flexible hierarchical structure processing: 
a hierarchical system capable of processing arbitrary opera- 
tions over trees requires computational abilities over and 
above those characterizing the simplest computational 
systems, at the ‘regular’ or ‘finite state’ level [26,27]. Finite 
state automata (FSA) can flexibly process sequences, and a 
single (non-embedded) level of grouping, but cannot deal 
with more than one level of nesting (and are thus unable to 
correctly parse the sentence above). There are multiple 
well-defined levels of processing above this regular level, 
including context-free grammars (corresponding to push- 
down automata) and context-sensitive grammars (linear 
bounded automata). I will refer to these as ‘supra-regular,’ 
remaining agnostic about precisely where in this frame- 
work a particular structure or language lies. 


Crucially, all supra-regular computational systems have at 
their heart a FSA; this finite state machine is then 
augmented by some additional form of memory, such 
as a counter, stack, or queue, which enables intermediate 
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results to be stored indefinitely during ongoing sequential 
processing. A large formal literature details the computa- 
tional effects of adding different forms of memory [28,29], 
but here I will skirt around these issues and simply state 
that supra-regular processing requires, beyond the core 
FSA, some additional memory store, such as a stack or 
equivalent (cf. [17]). This has been common knowledge 
in computational linguistics for many decades. 


A decade or so ago, biologists [5°°,30-33] began a research 
programme aiming to augment this formal and linguistic 
picture by comparing human processing abilities with 
those of non-human animals (‘animals’ hereafter). What 
kinds of computational resources do animals bring to bear 
when they process sensory patterns? What kinds of gen- 
eralizations do they make, and what cues are relevant? 
Results of this research programme have been reviewed 
in several places [34,35°,36]. 


unconvincing by later analyses and critique [37,38]. So 
supra-regular processing capabilities appear to represent a 
clear and well-defined distinction between animals and 
humans, directly relevant to human language abilities. 


This is 


The next obvious question is 
‘implemented in human brains?” Here, decades of work in 
both aphasia and brain imaging paint a rather clear and 
consistent picture that Broca’s area (i.e., [Brodmann’s 
areas 44 and 45]), via its connections with other brain 
regions, plays an important role (cf. [39-43]). To give just 
a few examples, when the size of hierarchical chunks 
being processed by participants in an fMRI experiment is 
systematically varied, independent of semantics, 
increased chunk size leads to a steady increase in the 
activation of Broca’s area [44]. When the syntactic com- 
plexity of German sentences is increased by using atypi- 
cal syntactic framing of the same meaning, Broca’s area is 
preferentially activated for more complex sentences [45]. 
These examples could be multiplied considerably, and 
are consistent with the idea that Broca’s region plays a 
central role in processing hierarchical structure during 
sentence processing in humans. 


Furthermore, 


‘regions. Connectivity between frontal and temporal 
regions in non-human primates is heavily weighted 
towards a ventral pathway, shared with humans. By con- 
trast, a dorsal pathway linking Broca’s region to parietal, 
occipital, and temporal regions, the ‘arcuate fasiculus’, is 
uniquely strongly developed in humans [47]. Experiments 
contrasting simple sequence processing with hierarchical 
processing show a specific reliance of the latter on this 
novel dorsal pathway [48]. Direct comparisons of monkeys 
and humans processing auditory sequences found Broca’s 
activation, in humans only, for sequences demanding 
simultaneous attention to pattern and number [5°°]. 


This led me to propose 


Either Broca’s represents a 
single general-purpose stack, able to store intermediate 
results for any hierarchical computation, or it incorporates 
multiple specialized stacks holding different types of 
material, such as syllables (for phonological grouping), 
words and phrases (for syntactic structure) and semantic 
constituents (for semantic interpretation). The second 
hypothesis, at present, seems more likely to me and is 


consistent with the suggestion that Broca’s region repre-, 


Also consistent with this hypothesis is the finding that 
only sub-parts of Broca’s region are activated in specific 
linguistic tasks [45] and that this region is distinctly 
parcellated, based on receptor-based mapping [50,51]. 
More research is required to resolve this open issue about 
the number and nature of stacks in Broca’s region. 


In summary, the dendrophila hypothesis provides both an 
explicit computational characterization of how the abili- 
ties underlying human phrasal syntax differ from animal 
sequence-processing abilities and a specific implementa- 
tional model of how this difference is implemented in the 
human brain. However, as it stands, the dendrophilia 
hypothesis leaves open the question of precursors, that 
is, from what pre-existing neural and behavioural basis 
were these putatively species-specific capabilities 
derived? I will now address this question. 


Phonology versus syntax: concatenation 
versus embedding 

Although any rule-governed activity can be captured by 
some grammar, it has been convincingly argued that all 
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Figure 1 
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The extended formal language hierarchy. A visual representation of some key components of the formal language hierarchy (aka ‘Extended 
Chomsky Hierarchy’), showing where different exemplary linguistic phenomena lie in terms of computational complexity. Crucially, phonological 
phenomena all lie somewhere in the sub-regular region (meaning they can be computed by some finite state automaton) while many syntactic 
phenomena require a more powerful system (a pushdown automaton or better). 


Modified from Ref. [53]. 


phonological phenomena in human natural language can 
be processed at the regular level [52,53], while supra- 
regular grammars are needed to deal with phrasal syntax. 
This key distinction is illustrated, mapped on to the 
formal language hierarchy, in Figure 1. But how are these 
abstract computational differences implemented 
neurally? 


It has been suggested on formal grounds that the compu- 
tational difference between phonology and syntax could 
boil down to a difference in the combinatoric operation 
required, changing the type of structures operated upon 


[54°]. In phonology, the fundamental operation is concat- 


The key syntactic dimension is thus the vertical par- 
ent-child relation (aka the dominance/subordination 
dimension), rather than the ‘sister’ relations between 
adjacent terminal nodes representing words or mor- 
phemes. Discussions of the ‘chain of command’ (‘Susan 
oversees Sam oversees Sally’) or ancestry (“Adam begat 
Seth, and Seth begat Enos ...’) imply such a vertical 
readout of a hierarchical combination of elements. 


Extending this idea in a neural direction, I suggest that 
our ability to process hierarchical structure required the 
conversion of computational circuitry competent at 
sequential processing into a circuit specialized for 


hierarchical processing in this vertical dimension (for 
potential sequential precursors see [5°°,6°,16°°,42]). By 
this hypothesis, our massively expanded Broca’s area 
allows us to implement supra-regular processing via its 
robust dorsal connections to parietal and temporal cortex 
(cf. [55]). That is, human brain evolution ‘hijacked’ pre- 
existing primate sequencing capabilities and simple 
sequential grouping capabilities already present in our 
common ancestor with chimpanzees, and modified them 
to implement hierarchical multi-level grouping. Specifi- 
cally, I suggest that the syntax/phonology distinction 
arose via a duplication of regular-level sequential cir- 
cuitry, followed by a differentiation of these circuits to 
focus on hierarchical structure (and to be relatively ‘blind’ 
to sequence). However, the circuitry necessary to recog- 
nize items and process sequences, requiring only finite- 
state resources, and implemented mainly via ventral 
pathways [39], stayed essentially the same. Thus I 
hypothesize that the finite-state machinery at the heart 
of any computing system changed very little 
(‘phonological continuity’) while the novel ‘stack’ com- 
ponent needed for dendrophilia, instantiated in Broca’s 
area via its dorsal connections to the parietal and temporal 
cortices, arose via neural circuit duplication and 
divergence. 


Such duplication-with-differentiation events are known 
to play an important role in molecular evolution [56,57], 
and have been suggested to play an evolutionary role in 
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neural diversification as well. For example, trichromatic 
vision in many primates may result from a gene duplica- 
tion of a photoreceptor pigment gene and subsequent 
differentiation into long-wavelength and middle-wave- 
length variants [58,59]. Neurally, Jarvis and colleagues 
have suggested that the song system in vocal learning 
birds results from duplication and differentiation of gen- 
eral motor system circuitry [60,61]. Friederici and collea- 
gues suggest that, in addition to the ventral pathway 
required for sequential processing, there are in fact fwo 
dorsal pathways: one present at birth and required for 
vocal learning, and a later-maturing second pathway 
specifically involved in supra-regular syntax [55,62,63]. 
Finally, it has recently been proposed that the novel 
dorsal human laryngeal motor cortex results from a dupli- 
cation and differentiation of the ventral laryngeal cortex 
shared with other primates [64°]. Thus, numerous 
hypotheses have the common theme of duplication of 
either genes or neural circuits (perhaps both), with sub- 
sequent differentiation. 


The phonological continuity hypothesis 


‘mechanisms. The argument that our ‘sequential brain’ 
is shared with other species is of course a strong claim: its 
purpose is to focus the attention of phonologists and 
animal-cognition researchers on this important but almost 
completely neglected research topic. 


Two immediate potential objections to this strong word- 
ing concern absence of evidence and defining homology. 
It is of course a challenge to demonstrate the absence of 
any ability — the fact that some individuals fail to learn 
something in some experimental context is a null result 
that could stem from experimenters’ ineptitude at design- 
ing the experiment and/or participants’ lack of motivation, 
rather than true inability. Thus, a convincing demonstra- 
tion of inability should involve paired tasks that differ in 
only one key respect, where subjects succeed on the 
control task and fail on the focal task (e.g., [30]) — only 
then can we conclude that experimental conditions were 


adequate for learning and generalization by members of 
this species, using this category of sounds and experimen- 
tal paradigm. The more closely the two tasks are matched, 
the more convincing is the argument that failure on the 
focal task indicates inability, particularly if this failure 
continues even with training or extra data presentation. 


Regarding homology, it is always possible that two species 
accomplish the same task using drastically different algo- 
rithms or neural mechanisms, so behavioural success 
would be only one criterion for defining ‘continuity’ by 
the PCH; homology is another key requirement. While 
this might seem to require an exclusive focus on nonhu- 
man primates or mammals, I think such a focus would be 
premature: birds are often more tractable study organisms 
for auditory tasks, and many aspects of the avian brain are 
indeed homologous to those of mammals despite superfi- 
cial differences [66-68]. There are few grounds at present 
for believing that the ‘sequential brain’ underlying finite- 
state processing is limited to primates or mammals (cf. 
[36,69,70]), and arguments for convergent evolution in 
birds and mammals are premature without a better under- 
standing of such processing in more basal vertebrates 
(e.g., non-avian reptiles like crocodilians). 


It is worth emphasizing that the PCH adopts a quite 
literal viewpoint concerning formal language theory, and 
this is at variance with some traditional (and perhaps more 
intuitive) notions about the phonology/syntax distinction. 


indeed: musical notes or visual icons. This proposed 


mapping of the formal automaton onto proposed neural 
circuitry is illustrated in Figure 2. 


My fundamental goal in proposing the PCH is to help 
isolate explicitly defined and experimentally tractable 
phonological phenomena that animals can master, and 
to determine the underlying computations, in direct 
comparison with humans (cf. [6°]). This would enable 
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Figure 2 
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Mapping regular and supra-regular processing components onto the 
brain. Panel (a) shows the components of a pushdown automaton, the 
simplest supra-regular computational model; (b) shows a human brain 
with postulated color-coded equivalences. The phonological continuity 
hypothesis suggests that sequential processing at the finite-state (sub- 
regular) computational level is accomplished via a fronto-temporal 
circuit connected via the ventral pathway shared with other primates 
(b, blue), while the auxiliary memory (pushdown stack) required for 
supra-regular processing is implemented in Broca’s region of the 
inferior frontal cortex (b, green), via dorsal connections strongly 
developed only in humans (red arrows). 


neural investigations of the underlying mechanisms, pro- 
viding a firmer basis to specify what, precisely, needed to 
change during human evolution. Far too little is known, at 
present, about animal phonological processing abilities to 
make any pronouncements at present, but to the extent 
that the PCH is correct, everything we learn about animal 
phonology and its neural implementation would have 
direct payoffs in our understanding of human syntactic 
processing. 


Conclusion 
These are exciting times in neurobiology, given the 
astounding power of modern methods to fuel increased 
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understanding of low-level neural implementation. But 
applying these tools to topics relevant to the biology and 
evolution of language remains an important challenge. If, 


as many researchers have suggested, human syntactic 
capabilities really are unique, it may appear to doom 
all studies of ‘animal syntax’ to irrelevance. I have argued 


above against this pessimistic view, and outlined a pro- 
ductive middle road for future research, which is based on 
explicit computational considerations and is consistent 
with data from linguistics, animal cognition, and neuro- 
science. The PCH lays out a panoply of clear testable 
predictions and provides the starting point for a research 
programme that offers a way out of this apparent impasse. 
Even if, as I suspect, some details of this model turn out to 
be wrong, I predict that the model will at least be useful. 
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